The Annals of Statistics 

2005, Vol. 33, No. 6, 2967-2999 

DOI: 10.1214/009053605000000705 

© Institute of Mathematical Statistics, 2005 
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Bayesian methods are developed for the muhivariate nonpara- 
metric regression problem where the domain is taken to be a compact 
Riemannian manifold. In terms of the latter, the underlying geome- 
try of the manifold induces certain symmetries on the multivariate 
nonparametric regression function. The Bayesian approach then al- 
lows one to incorporate hierarchical Bayesian methods directly into 
the spectral structure, thus providing a symmetry-adaptive multi- 
variate Bayesian function estimator. One can also diffuse away some 
prior information in which the limiting case is a smoothing spline 
on the manifold. This, together with the result that the smoothing 
spline solution obtains the minimax rate of convergence in the multi- 
variate nonparametric regression problem, provides good frequentist 
properties for the Bayes estimators. An application to astronomy is 
included. 

1. Introduction. This paper develops Bayesian function estimation for 
the multivariate nonparametric regression problem where the domain is 
taken to be a compact Riemannian manifold. The approach is to incorpo- 
rate multivariate hierarchical Bayesian methods into the spectral structure 
of the Riemannian manifold, allowing one to explicitly capture any prior 
information about possible invariance or symmetry in the data. In partic- 
ular, a symmetry-adaptive multivariate Bayes estimator is proposed. This 
approach is a very natural way of modeling symmetries and presents a su- 
perior alternative to a frequentist approach. We now discuss this below. 

The sample space is usually taken as Euclidean. However, there has been 
interest in non-Euclidean sample spaces, with the main example being the 
unit sphere in various dimensions. Without going into a discussion of the his- 
tory of directional statistics, for that one can consult a very modern account 
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of the subject in [32], let us provide some details as to the general interest in 
the subject of a Riemannian sample space, that is, the sample space being a 
Riemannian manifold, along with the statistical study of certain symmetries 
inherent in the manifold. In some applications these symmetries often arise 
due to certain physical constraints imposed by laws of motion. 

The greatest number of investigations into the statistical study of sym- 
metries on a Riemannian sample space involves testing. An early paper 
by Beran [4] investigates testing for uniformity on compact homogeneous 
spaces, followed by generalizations to compact Riemannian manifolds in 
[16] and investigation into the two-sample problem on Riemannian man- 
ifolds by Wellner [48]. Extensions to specific but more general manifolds 
have been documented in [8], Chapter 6, which include the case of Grass- 
mann and Stiefel manifolds. Recently, Chikuse and Jupp [9] consider tests of 
uniformity on shape space. Now one can consider testing for uniformity on 
a Riemannian manifold as the ultimate form of symmetry. However, "par- 
tial" symmetries, such as rotational invariance around a particular axis on a 
unit sphere, can be of even greater interest in certain physical applications. 
Starting within the framework of Gine [16], Jupp and Spurr [23] examine 
testing for symmetries that are not necessarily the full symmetry of unifor- 
mity, but only partial symmetries. In more technical terms, which will be 
made precise below, testing for uniformity can be associated with study- 
ing invariance with respect to the full group of isometrics on the manifold, 
whereas partial symmetries can be associated with studying invariance with 
respect to proper subgroups of the full isometry group. 

For multivariate function estimation on Riemannian manifolds, there are 
some investigations where the primary approach is frequentist. There are a 
number of works on multivariate function estimation on a unit sphere, see 
[5, 18, 19, 25, 26, 28, 41, 44, 45, 46]; on Stiefel manifolds, see [29] and [8], 
Chapter 10; on Lie groups, see [24, 27]; and, on general compact manifolds, 
see [20, 27]. Other, more exotic analyses have been done in [46], Chapter 2, 
where multivariate functions are estimated on submanifolds of the tangent 
bundle manifold of a sphere. There is also some work involving fitting smooth 
curves on a manifold; see [17, 21, 36]. In some remarkable engineering appli- 
cations to polymers and robotics, Chirikjian and Kyatkin ([10], Chapters 12 
and 17) examine multivariate function estimation on the Euclidean motion 
group. In computer vision and pattern representation, the sample space is 
taken to be a certain space of complex matrices, with a pattern being defined 
through symmetries on this manifold; see [42, 43]. Thus, one can see that 
there is interest in multivariate function estimation on sample spaces beyond 
the Euclidean space, and together with current computing capabilities, it is 
foreseeable that the demand for statistical techniques that go beyond the 
traditional Euclidean sample space will increase. 
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The statistical interest in testing for symmetry, partial or full, and in 
frequentist function estimation on Riemannian manifolds is, nevertheless, 
disjoint, although it is conceivable that there would be advantages to com- 
bining them if possible. Indeed, a frequentist approach would involve one 
of the following options: assume symmetry (whatever form) is present in 
the data; ignore any symmetry altogether; or initially test for any symme- 
tries using the methods established in [23], followed by function estimation 
depending on the outcome of the test. One can see that the above three 
frequentist approaches have certain difficulties. The first approach runs into 
the difficulty that symmetry may not really be present, while the second 
approach has the difficulty that symmetry may be present. The third ap- 
proach has difficulties in interpreting mean squared error calculations since 
one is conditioning on that part of the sample space that rejects (or accepts) 
the symmetry in question. Thus, one can see there are some shortcomings in 
the frequentist approach. Alternatively, a Bayesian approach to multivariate 
function estimation on manifolds appears to be a superior solution in that 
the Bayesian approach allows one to exploit any invariance or symmetry in 
the data directly, by eliciting very specific prior information on the possible 
symmetries in question through the spectral structure on the Riemannian 
manifold. It is in this way that we mean (mentioned in the opening para- 
graph) that the Bayesian approach is natural for this class of problems and 
is the subject of this paper. 

We now provide a summary of what is to come. In Section 2 notation 
and some geometric preliminaries are provided. As well, some explicit de- 
scriptions of the manifolds discussed above are presented. In Section 3 we 
initiate a frequentist approach to the multivariate nonparametric regression 
function estimation problem. This defines the minimization problem in a re- 
producing kernel Hilbert space; see [46], Chapters 1 and 2. We show that a 
unique solution defines what may be termed a spline on a manifold which, in 
turn, confirms a conjecture raised by Wahba [45]. It is shown that the spline 
solution attains the minimax rate of convergence. As a precursor to the 
next section, we generalize a result which shows that the spline solution has 
Bayesian connections when diffuse priors are used. In Section 4 we formally 
embark upon the task at hand by incorporating initial prior information into 
the model. We treat the model as involving both a symmetric and nonsym- 
metric part, assume normality on the first stage finite-dimensional priors and 
treat the infinite-dimensional part of the model as a nuisance parameter. We 
can then control the amount of symmetry by controlling the variance terms 
and employ hyperpriors to deal with the prior parameters. This leads to 
symmetry-adaptive hierarchical Bayesian function estimators. Bayes factors 
are subsequently used to determine the truncation level. As for dealing with 
the nuisance parameters, we can diffuse some of them away and in so doing, 
we can obtain as limits the smoothing spline solutions on manifolds. This 
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suggests that the hierarchical Bayes estimator has good frequentist proper- 
ties, which are discussed in Section 5. In Section 6 we go through a detailed 
analysis for the 2-sphere, as well as present some numerical work on long 
period cometary orbits. It is here that we see the benefits of incorporating 
prior symmetries into the model. This extends an earlier study of this data 
set; see [23]. Section 7 contains the proofs and an appendix is included for 
further needed technical details. 

2. Notation. Let M be a compact connected orientable Riemannian man- 
ifold. Consider the Riemannian structure {g{p) -p € M} and let dx be the 
normalized volume element of M associated with this structure. For each 
fixed p G M, we can associate with g{p) a matrix {gij{p)) called the metric 
tensor. We will, in addition, assume that the manifold is without boundary, 
although we could generalize the following arguments to certain boundary 
conditions. 

Let C°°(M) be the space of real- valued infinitely differentiable continuous 
functions on M. Denote by 



the Laplace-Beltrami operator on M, where dj denotes the partial derivative 
with respect to the jth component, {g^-'ip)) is the inverse of {gij{p)) the 
metric tensor and \g{p)\ is the determinant of the matrix (gij{p)). We note 
that A is an elliptic self-adjoint second-order differential operator on C°°(M) 
for which the eigenfunctions of A are a complete orthonormal basis for 
£^(M), the space of real- valued square integrable functions. 

Let A be an eigenvalue of A. The collection of all eigenvalues for a given M 
is countably infinite, hence, letting No = {0, 1,2,.. .}, we can enumerate the 
eigenvalues by Afe > 0, k £ No, with no upper bound. Furthermore, we will 
use the convention that Aq = and that A^ < A^+i for A; € No. For each 
k, let (pii-j be an eigenfunction so that Ac/^^-j = Xk4>kji ^or j = 1, . . . , L^, and 
denote by £k = sp{(j)kj : j = 1, . . . , Lk}, where sp(-) stands for the span of the 
object in question. Then dimif^^ = < oo, k G Nq, where dim(-) denotes the 
dimension of the object in question. 

Let (j)k = {4>ki, • • ■ 5 4'kdimSky , where superscript "/" denotes transpose and 
let denote the dot product on M'i'^^fe, with || • \\k the induced norm, 

A; G No. For h G £^(M), the eigenfunction expansion will be defined by 




oo 




(2.1) 




for k gNq, where integration over M is defined piecewise using the usual 
partition of unity argument; see (A.l) in the Appendix. 
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We can consider subspaces of £^(M) in the following way. First, on the 
space C°°(M) of infinitely continuous differentiable functions on M, consider 
the Sobolev norm || • H//^ of order s defined accordingly. For any function 
h = Y.k{hk-,4>k)k, let 

(2.2) II/^IIh.= E ^^ll^^^lli- 

Afc>0 

One can verify that (2.2) is indeed a norm. Denote by Hs{M) C ^^(M) the 
(vector-space) completion of C°°(M) with respect to (2.2). This will be called 
the Sobolev space of order s > dimM/2. In addition, we will also consider 
Sobolev ellipsoids defined by 

i/,(M,M) = jtx G i/,(M) A|||%||2 < m|, 

for M > and s > dimM/2. 

Often M is equipped with certain symmetries. A Riemannian manifold 
is homogeneous if its group of isometrics G acts transitively on M, where 
by the latter we mean that for every p,q £M. there exists an h £ G such 
that p = hq, where multiplication denotes the group action G x M — > M. For 
every p G M, let G^ = {h£G:hp = p} denote the isotropy subgroup of p. It 
is well known that if M is a homogeneous compact connected Riemannian 
manifold, then, for every p G M, G^ is a closed subgroup of G and there exists 
a diffeomorphism between the quotient space G/G^ and M. A differentiable 
function / : M — > M is called a zonal function with respect to the action of the 
isotropy subgroup C, p G M, if it is constant on the isotropy subgroup G^. 

New manifolds can be created by taking products of existing manifolds, as 
well as by taking quotients as in the case of a homogeneous space. For con- 
creteness, let us consider specific examples that will be used for illustration 
throughout the paper, as well as those mentioned in Section 1. 

Example 2.1 (Sphere). The sphere SP~^ C W is the set of unit vectors 
in p-dimensional Euclidean space. In the case where p = 3, we note that any 
point on S'^ can almost surely be represented by 

(2.3) uj = (cos(^sinT?,sin93sinT9,cos'!?)', 

where ip £ [0,27r), -i? G [0,7r) and superscript "/" denotes transpose. 

Example 2.2 (Orthogonal and special orthogonal group). The orthog- 
onal group 0{p) consists of the space oi p x p real orthogonal matrices. 
However, this group is not connected. The connected component consisting 
of those real orthogonal matrices having determinant equal to unity, SO (p) , 
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is called the special orthogonal group. Again, in the case of p = 3, 5*0(3) 
can be represented in the following way. Let 

(cosy? —simp 0\ / COS"!? sin-i^X 

sinp cosp , a('i?) = 1 > 
1/ V-sini? cos??/ 

where ip £ [0, 27r), -d £ [0, vr). The well-known Euler angle decomposition says 
any element of 50(3) can almost surely be uniquely written as 

g = u{pi)a{'d)u{p2), 
where pi £ [0, 27r), p2 £ [0,2tt), ^ £ [0,7r). 

New manifolds can be created from products and quotients of these ex- 
amples. In fact, it turns out that 5*0(3) is the transitive group of isometries 
on S*^. Furthermore, the subgroup 

S0{2) = {u{p):p£ [0,27r)} 

of 50(3) is the isotropy subgroup of (0,0, 1)' £ S^. Thus, we can identify 5^ 
with the quotient space 50(3)/50(2); hence, 5^ is a homogeneous space. 
Throughout the paper we will use the sphere 5^, as well as its transitive 
group of isometries, 50(3), for illustrative purposes leading up to the appli- 
cation in Section 6. 

Some of the other manifolds previously mentioned include the Stiefel man- 
ifold, Vfc(MP) = 0{p)/0{p - k), the Grassmann manifold, Gk{W) = 0{p)/ 
{0{p) X 0{p — k)), and shape space, 5P('=-i)-V50(p). Of the more exotic 
constructions, the collection of all tangent spaces on a manifold is called 
the tangent bundle and is itself a manifold. The Euclidean motion group is 
defined to be SO{p) t< M^, where ix denotes a semi-direct product. In com- 
puter vision, the sample space is taken to be 5L(2,C)/0(2), where 5L(2,C) 
denotes the space of 2 x 2 complex matrices of determinant 1. We note that 
the last two examples are that of noncompact manifolds. 

As for orthonormal bases, we have the following example. 



Example 2.3 (Spherical harmonics). Let 
V2 



\2k + l){k-qy.^. 



(2.4) Yk,{uj) = < 



'{2k + lj . 



ATr{k + q)] 
Po'(cos??), 



R (cosi9) cos{qp), 



V2j (^^.+.!)^.^,-'!^'^ ^,(cos^)sin(M^), 



ATT{k + \q\)\ 



q = l,...,k, 

q = 0, 

2 = -l,...,-k, 
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where tp G [0,27r), i9 G [0,7r) and Pj^ are the Legendre functions, —k<q<k 
and G Nq. We note that we can think of (2.4) as the vector entries to the 
{2k + l)-vector 

Ykiiv) = (n,(^)), 

\q\ < k and k G Nq- In this situation {Y^g - {ql < k,k £ Nq} are the eigenfunc- 
tions of the Laplace-Behrami operator on 5^ with eigenvalues = k{k + l), 
/c G No, and, hence, form a complete orthonormal basis over 

Some further technical properties are provided in the Appendix. In ad- 
dition, the following asymptotic notation will be used. Let {a^} and 
denote two real sequences of numbers. We write <C bn to mean a„ < Cbn 
for some C > as n — > oo, the Vinogradov notation. This notation is more 
convenient than the "big oh" Landau notation since expressions that are 
within an order of magnitude can be long and, furthermore, can create con- 
fusion with the notation for the orthogonal group. We will, however, use 
the notation a„ = o(6„) to mean a„/6„ ^ as n ^ oo. Furthermore, a„ x bn 
whenever a„ <^ bn and bn<tian, and a„ ~ 6„ when a„/6„ — > 1 as n — > oo. 

3. Nonparametric regression and splines on manifolds. Let / G C'^{M.). 
Its eigenfunction expansion, as defined in (2.1), is 

oo 

(3.1) f{x) = Y,{^k,Mx))k, 
for x G M and where 

(3.2) 7fc=/ f{x)(t>k{x)dx, 4>kG£k, 

JM 

for k = 0,1, If we observe (3.1) at the points xi, . . . ,x„ G M, then our 

observations are 

(3.3) yi = f{xi) + ei for i = 1, . . . ,n, 

where we assume e = (ei, . . . , £n)' ~ -/V(0, cr^/), o"^ > 0, and we are interested 
in estimating /, a real- valued function on M. We note that the Fourier 
coefficients in (3.1) and (3.2) are denoted by 7^ and not fk, k G No, as was 
done in Section 2. The purpose for this departure is due to the fact that the 
Bayesian framework will later treat the coefficients as random quantities. 

For any fixed value of K > Q, called the truncation level, (3.1) can be 
written as 



(3.4) 



f{x)= fK{x) + ll{x), 
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where 

K oo 

(3.5) fK{x)=^{-ikAk{x))k and i]{x) = ^ {lkAk{x))k- 

k=0 k=K+l 

Then we can write the regression problem (3.3) as 

(3.6) y = $7 + r? + e, 

where y = (yi, . . . , y„)', r] = {r]{xi), . . . ,r]{xn))' , 7 = (7i, ■ • ■ ,7«:)', 1^ = 
X;|Lo dim = (£!,...,£.„)' w.d^ = {(t)k{xi)) for A: = 0, 1, . . . , K, z = 1, . . . , n. 

3.1. Splines on manifolds. We wih need the foUowing notation. Let xi, . . . , x. 
rr G M, and define 

Q^(x,i,Xi2)= A^'(0fc(x,J,(/.fc(x,2))fc. 

k>K+l 

Define the n x n matrix 

where Xi^^Xi^ G M, ii , ^2 = 1, • ■ • , C > and /„ is the n x n identity matrix. 
Furthermore, define the k x 1 vector ^{x) = {(pkix)), and define the n x 
1 vector q(x) = [Q'^{xi,x), . . . ,Q^{xn,x)]' . The following generalizes earlier 
multivariate spline smoothing methods of Wahba [45], Cox [11] and Taijeron, 
Gibson and Chandler [41]. 



Theorem 3.1. Let Wl be a compact connected orientahle Riemannian 
manifold. Assume xi, . . . ,Xn are distinct points on M., x e M, and consider 
the following smoothing problem: 

1 " r 
min -y^{u{xi) -Vif +U \A'/^uix)\'^dx, 

where ^ > 0, s> dim(M)/2 and G M for i = 1, . . . ,n. Define 

f^{x)=^{x)'d + q{x)'c, 
where the n x 1 vector c and the k x 1 vector d are defined by 

c=[<r'(/n-<^(^'[<r^f)-^<i>'[Q^,r')y' 
d=mQi^]-''^r'nQi^]-'y, 

with y = (yi, . . . , y„)'. Then f^{x) for ^ > is the unique solution to the 
smoothing problem. 
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Remark 3.2. The practical choice of the smoothing parameter ^ > can 
be determined by using generaUzed cross- vahdation similar to that outlined 
for the Euclidean case in, for example, [46], Chapter 4. 

Remark 3.3. As stated in Section 1, one can always create new mani- 
folds by taking products of existing ones. This leads to tensor splines as the 
multivariate functions to be estimated over product manifolds. Such is the 
approach taken in [31], which is a special case of the more general tensor 
spline construction; see [46], Chapter 10. In particular, one can freely con- 
struct any number of products of compact manifolds since the individual 
reproducing kernels can be amalgamated into one. Thus, for example, one 
can take products of various spheres, rotation matrices, Stiefel and Grass- 
mann manifolds. 



3.2. Minimaxity. In terms of assessing the spline estimator of Theorem 
3.1, we are led to ask about the type of frequentist properties they possess. 
Details of minimaxity for spline estimators have been investigated in the 
univariate case by Speckman [40], along with an extension to a multivariate 
framework by Cox [11, 12]. 

We first state the following sharp lower bound result which is directly 
related to Theorem 1 of [35] and is stated as Theorem 2.1 of [14]. 

Theorem 3.4 (Pinsker and Efromovich). Let M. be a compact connected 
orientable manifold without boundary. Then 

inf sup E||/-/||2 

/ feHsiM,M) 

> + o(l)) 

as oo, where the infimum is taken over all estimators, s > dimM/2, 

volM 

W = W(M) = (2^)dimMr(i + dimM/2)' 
p = p(s, dimM) 

2^ X 2s/(2s+dimM) / 2^ _|_ ^imMX 

,2s + 2 dimM j V dimM ) 

vol(-) denotes volume and r(-) denotes the gamma function. 

Remark 3.5. We emphasize that this lower bound is over all estimators. 
The constant W is a geometric invariant associated with asymptotic calcula- 
tions performed by Hermann Weyl; see (A. 2) in the Appendix. The constant 
p is associated with asymptotic calculations performed by Pinsker [35]. 
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The existence of the spHne solution only requires the design points xi, . . . , 
a;„ G M to be distinct. More, however, will be required in order to calculate 
the integrated mean squared error. In the Euclidean univariate case, Speck- 
man ([40], page 972) states the necessary condition on the design needed to 
achieve the minimax lower bound. For the multivariate case, the assump- 
tions are listed in [11], pages 791 and 792. In both of the above univariate 
and multivariate cases, the domain in question is embedded in a Euclidean 
space of the same dimension. This, however, is not possible for a general 
compact manifold; hence, the assumptions necessary on the design points 
xi, . . . ,Xn G M must satisfy local versions of the assumptions of Cox ([11], 
page 791). 

Let us state the four assumptions for the Euclidean case. Assumption 1 is 
just the statement of the nonparametric regression model (3.3). For U C M'^, 
a bounded open set, consider the points t = {ti,... jid)' ,tk = {tki, ■ ■ ■ jtkd)' G 
U cM'^, k = 1, . . . ,n. Define the empirical distribution function in the usual 

way, 

Fn{t)= n-\ 

where summation occurs over every coordinate, j = 1, . . . ,d and k = 1, . . . ,n. 
Let F{t) denote the cumulative distribution function for ii, . . . , t„ and define 

bn = SUp\Fn{t)-F{t)\. 

t 

Assumption 2 states that the smoothing parameter, ^ > 0, must satisfy 
i G [^n, H„], U < H„, hm = lim H„ = 0. 

n^oo n— >oo 

Furthermore, assumption 3 says that the density of F{t) must be bounded 
away from and infinity, while assumption 4 requires the open set [/ C M*^ 
to be bounded simply connected with a smooth boundary. 

The local version of assumptions 2, 3 and 4 can go as follows. By the 
definition of a manifold, for every point p G M, there exist an open set Oa C 
M, a local diffeomorphism V« : ^ V'a(Ca) C RdimM ^nd p G O^. The pair 
{Oa,ipa) is called a chart, and the collection of all charts is called an atlas 
if it covers M. The atlas is defined in greater detail in the Appendix. The 
local version of assumptions 2 and 3 of [11], page 791, is, therefore, that these 
assumptions take place on every chart. Integration on M requires a partition 
of unity V = {6a '■ a G A}, subordinate to the atlas where (5q, : M — > [0, 1] with 
the support, supp^Q, C Oa, a G A; see the Appendix. The local version of 
Cox's assumption 4 is that the boundary of supp^Q, is smooth. To give these 
three assumptions a name, we will say that xi, . . . ,Xn G M locally satisfies 
the Cox assumptions. We have the following result. 
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Theorem 3.6. Suppose xi, . . . ,Xn G M locally satisfies the Cox assump- 
tions. //^>cri-2''/(2^+dimM)^ ^/jg^ 

]E||/|» _ /||2 ^ j^-2s/(2s+dimM) 

as n —> OO, where f G Hg{M., M) for some M > and s > (dim M) (5 dim M — 
2)/4. 

Thus, in terms of the rate of convergence, the sphne estimator of Theo- 
rem 3.1, with stronger order of smoothness, achieves the lower bound. We 
suspect that the required order of smoothness can substantially be reduced 
and this will be pursued elsewhere. 

3.3. Splines as Bayes estimators with diffuse priors. Although smooth- 
ing splines are a general computational method for function fitting, there is, 
however, a Bayesian interpretation. This Bayesian approach to smoothing 
splines on the unit interval is discussed in [46], Chapter 1. In the following 
we adapt that approach for M partly to generalize Theorem 1.5.3 of [46], 
but mainly because subsequent hierarchical modeling builds from this earlier 
work. 

To generalize Theorem 1.5.3 of [46], we need to consider the concept of 
a random field X on M. In general, the random field can be expanded in 
terms of the eigenfunctions so that 

Xip)=J2{Zk,Mp))k, P(^M, 

k 

where is a sequence of independent, mean zero, dimiSfc-dimensional ran- 
dom vectors, with each coordinate having variance a^, k gNq. If, in addition, 
each Zfc, G No, is normally distributed, we say that the process X is Gaus- 
sian with covariance kernel 

EX{p)X{q) = 4{Mq), Mp))k, 

k>0 

for p,q gM. and Ck^O, k gNq. We have the following result. 

Theorem 3.7. Let X{x) be a mean zero real-valued Gaussian random 
field defined on M with covariance kernel 

oo 

Q^{xi,X2)= K''i^k{xi),(l)kix2))k, 
k=K+l 

for xi,X2 GM., for some K >0 and s > dimM/2. Consider 

K 

f{x) = Y{lk,Mx))k + TX{x) for X em andT>0, 

k=0 
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and suppose we observe f at the points xi, . . . ,Xn G M. Let our observations 
be 

yi = f{xi)+ei for i = l,. . . ,n, 

where e = (ei,...,e„)' ~ N{0,a^I), o"^ known. Suppose 7fc|T^ ~ -^(0, 
r^z/Jdimffe); t'^ cind u known, for k = 0,1, . . . , K . Consider the Bayes so- 
lution 

fu{x) =E(/(x)|yi,...,y„) 

and suppose f^{x) is the solution to the smoothing problem with ^ = cr^/ (nr^) . 
Then for each fixed x G M, 

lim f,{x)=f^{x). 

4. Symmetry and Bayesian modeling. A Riemannian manifold can ex- 
hibit symmetries which we want to directly capture in the modeling pro- 
cess. Indeed, as in [23], let & be a subgroup of the isometry group of M; 
see Section 2. We say that / G £^(M) is invariant under the action of if 
f{gx) = f{x) for all g G and x G M. In terms of the eigenstructure, for 
any A; G Nq the eigenspace £k decomposes into two orthogonal subspaces. 
Denote by £^ the eigenfunctions in £k that are invariant with respect to , 
which will be referred to below as having G^-invariance and Sl. , its orthogo- 
nal complement in £i^, which will be referred to below as non-G^-invariance. 
This allows us to write 

(4.1) £k = £l®£l 
and inner products to be written as 

(v). = (v)^ + (v)^ 

for k G No. 

Let / G £^(M). Thus, to exhibit this G'^-invariance let us rewrite (3.1) as 

oo 

(4.2) f{x) = Y.{{^Uu{x))l + {il,M^))l}, 

k=0 

for X G M and where 

7fc=/ f{x)(j)k{x)dx, 4>k&£i, 
Jm 

for J =0,1, /cGNq. 

Remark 4.1. We would like to remark that the splitting up of the sums 
in terms of the G'^-invariant part allows us to later incorporate explicit prior 
assumptions of G'^-invariance. We note that if one assumes that G*^ is the 
trivial subgroup, then £^ = £k, lience £l = {0} for all A; G Nq. 
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Example 4.2. In terms of S'^, we can consider the subgroup to be SO (2) 
of the full transitive group S0{3). Thus, for a function — >M, 5*0(2)- 
invariance would mean that / is a zonal function and thus only depends on 
'd G [0,7r). In terms of (4.1), by looking at the definition of (2.4), one can see 
that, for £^(5*^), S'0(2)-invariance would mean 

(4.3) £k={yko} and Sl = {Ykg : 1 < \q\ < k} 
for each k gNq. 

There are two approaches for dealing with the parameters -jk for k > 
K + 1. One can engage in eliciting very informative prior information in 
order to estimate the r/j's; see [1, 2]. Alternatively, one can adopt an approach 
wherein rji are combined with the measurement errors £i for i = 1, . . . , n; see 
[3] . The latter approach is truly Bayesian, but at the inference stage we can 
treat these r/j as nuisance quantities and eliminate them by integrating out 
(rather than estimating or diffusing) the corresponding parameters. In this 
section we will deal with the latter approach. The former approach will be 
discussed in Section 5 since this method of analysis allows one to make direct 
comparisons with splines. 

4.1. Eliciting prior information. Our prior belief in the G^-invariance of 
(3.1) under the subgroup G", which is explicitly invoked in (4.2), can be 
captured using a mixture normal model, that is, 

(4.4) 7i>' ~piv(o,r2rf ) + ii-p)Nioyri'), 

for some > 0, where T-jJ' = diag(/3^^), £ = 1, . . . ,dimSk, k G No, j = 0, 1, 
r = 0,1, and p models our prior belief that the G'^-invariance assumption is 
true. 

The G^-invariance assumption is also taken into account by having smaller 
variances. Indeed, the components of the variance of 7^ will be smaller than 
that of the components of the variance of 7^. In addition, we would like 
to have finite variance for the rji^s. All of these properties can be obtained 
by assuming Plj < j3^j and /3^^ = for £ = 1, ... , dimS^, k = I, . . . ,K. Fur- 
thermore, assume Pil Pi] < A^Mor = + 1, . . . , j = 0, 1, where the A^'s 
are the eigenvalues of the Laplace-Beltrami operator A on M defined in 
Section 2. 

Once a joint prior distribution is specified for and (the hyperparameter) 
T^, the prior model is complete. Note further that, since later we assign a 
second stage prior on the variance factor r^, their marginal prior distribution 
will no longer be normal, but a heavier tailed distribution ensuring a certain 
degree of prior robustness to our estimator (cf. [6], Chapter 4). 
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Consider the k x 1 vector 7 = (7^^ for k = 0,1, . . . , K and j = 0, 1. The 
prior specified above indicates that 

7|t2 ~piV(0,r2r°) + {l-p)N{0,T^r^), 

where the k x k covariance matrices 

where r = 0, 1, with the direct sums being taken over k = 0,1, . . . ,K and 
J = 0,1. 

For the remainder term, write 

00 
k=K+l 

for xGM, j = 0, 1. Now 

{rji,..., rjiYlr' ~ pN{0, r^Q^^') + (1 - p)N{0, t'Q^^'), 

where Q^^'^ = (Q^J"(xi,,XiJ), and 

00 

(4.5) Q^^''{xi^,Xi2) = {(t'k{xi^),^k'4'k{xi2))l, 

k=K+l 

where fif = d\ag{[i{\), £ = !,.. . ,dimf^, k = K + 1, . . . , j = 0,1, r = 0,1, 
Xi-^,Xi2 € M and ii,i2 = 1, ■ ■ ■ ,n . We have the fohowing result. 

Lemma 4.3. Suppose < /3^^ < for r = 0,1, j = 0,1, £ = 1,..., 
dimSl, k = K + 1, . . . , and s > dim(M.)/2, where the X^sarethe eigenvalues 
of the Laplace-Beltrami operator A on M. Then 

\Cov{4^,ri^)\<T'C{M,s), 

fori\,i2 = 1, . . . ,n, j = 0,1, where C(M, s) < 00 is a constant depending only 
on M and s > dim(M)/2. 

4.2. The posterior. Consider the n x k design matrix 

'^=mxi)), 

where k = 0,1, . . . , K , r = 0,1 and i = 1, . . . ,n. Then we obtain the fohowing 
structure. Given 7, and r^, we have the following linear model for the 
observations 1/ = (1/1, . . . , 

(4.6) y = ^>7 + ti, 
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where u ~ N{0, S) with S = cj2/„ + r2Q^(p), where Q^{p) =p(Q™°©Q™°) + 
(1 e Q^^"). This foUows from the fact that 

y|7,7?,cT2,T2~iV($7,a2/„) and 7]\t^ N{0,T^Q^ip)). 

From (4.6) and using standard hierarchical Bayes techniques (cf. [30]) and 
matrix identities (cf. [39], page 151), it follows that 

y\a^T^ ~ pN{0, a^In + t''{^T'^^' + Q^(p))) 

^^■^^ + (1 - p)iV(0, + t\^T^^' + Q^(p))), 

(4.8) 7l2/, <y\r^-- p*N[A'^y, B^^) + (1 - p*)iV(Aiy, B^), 

where 

pmO(y) 
pm^(y) + (1 — p)m}{y) ' 

(4.9) = r2rcD'(a2^„ + T\(i>r^' + Q^(p)))-\ 

^ ^2pr- _ ^4pr^/(^2j^ ^ T2($r$' + g^(p)))-^$r'', 

where r = 0, 1 and m?{y), m^{y) denote, respectively, the normal density 
with mean vector and covariance matrices cr^I„ + t^[^T^^' + Q^ip)) and 
c724 + r2($ri$' + Q^(p)). 

At this point (4.8) allows us to produce an estimator of (3.4) once the hy- 
perparameters in and are set. Two possible ways of handling this are 
the following: first, to use diffuse prior parameters; or, second, treat the cur- 
rent priors as first stage priors and add additional hyperprior assumptions. 
The first approach produces generalized Bayes estimators. In the following 
section we will use the hyperprior approach. 

4.3. Hierarchical Bayesian modeling. In order to proceed to the second 
stage calculations, some algebraic simplifications are needed (cf. [1]). Spec- 
tral decomposition yields $r^$' + Qn{p) = H' D'^'H'' , where D''' = d\a.g{d\, 
d2,---,d'^) is the matrix of eigenvalues and the orthogonal matrix of 
eigenvectors for r = 0, 1. Thus, 

a^In + T2(<5r^.' + Q^{p)) = H'\a^In + t'^D'W 

= T^H'\vIn + D'-)H'\ 

where r = 0, 1 and v = jr^ . Using this spectral decomposition, the marginal 
density of y given and v can be written as 

m{y\T^,v) =pm^{y\T^,v) + (1 - p)m^ {y\T^ ,v) , 
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where 

m 

(4.10) 
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= (2vrr^)-"/4n(. + <)-^/4exp|--i,f:(<)' 



where = (wj', . . . , w^^)' = H'^'y for r = 0, 1 



1=1 



Example 4.4. In the case of 5^, using the spherical harmonics (2.4) 
and invoking S'0(2)-invariance on , we can decompose the eigenspace 
as in (4.3). Thus, dimiS^ = 1 and dimi5^ = 2k for k = K + 1, Further- 
more, K = ^f^o(dim£:^ + dim£:^) = Ef=o(2^ + I) = {K + if. This type of 
symmetry is often observed in directional data. 

Fix 

41 = [(^ + V2)(fc + + 2){k + 3)]"^ 
for ah j = 0,l,\l\<k, k = K+l,... and 



_ J I/- 



0, otherwise. 



Then 



Q™n^i,'^2) 



(4.11) 



(2^)^ 



(ig2(w'iW2) - \) 



K 



Y^{{k + 1/2)(A; + l)(fc + 2){k + 3))-Vfc(a;;c^2) 



fc=i 



where wi , u;2 G S"^ 



,2(-) = i M 1 + J^ 
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1 — tj; 



1 — u; 



•l_y;\3/2 (\-w\ 

12(— ^) +6 — — +1 



(4.12) 

for \'w\ < 1 and is the /cth Legendre polynomial, k G Nq. Similarly, 

-P/c(^^2) 



(A; + l/2)(A: + l)(A; + 2)(A; + 3)' 
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4.4. Second stage prior and estimation. To derive the function estima- 
tor, all that is now needed is to eliminate the hyper and nuisance param- 
eters from the first stage posterior distribution, by integrating out these 
variables with respect to their second stage prior. Since it is well known 
(cf . [6] , Chapter 4) that the final Bayes estimator does not depend crucially 
on the second and higher stage hyperpriors, these priors can be chosen to 
simplify computations. Accordingly, the priors on and v can be chosen 
as 7r2,i(r^) oc (r^)"'^; see [3]. With this choice of prior on r^, the marginal 
prior on 7 has the form 

EE E 

k=oj=o e=i Pm I 

EEE^ 

This prior density corresponds to the limiting case of a multivariate Student- 
t density (which has heavier tails than the likelihood function). The prior 
on V is chosen to be an F-distribution with a and b degrees of freedom 
satisfying the following conditions: 

• the prior variance of v (= a(b-4:)(b-2y^ ) infinite; 

• the Fisher information number (= 2°a-4)(a+fe+2) ) minimum; 

• the prior mode (= ^^f^§y) is greater than 0. 

This can be done by choosing 2 < 6 < 4 and a = 8(6 + 2)/(6 — 2). Let 7122 (t') 
denote the resulting prior density. 

Once the second stage priors are specified, using (4.10) and taking the 
expectation with respect to r^, the Bayesian estimator of 7 under G'^- 
invariance (r = 0) or non-G'^-invariance (r = 1) is given by 

(4.13) Y = r''^'H'-E'[ivIn + D')-^\y]H'-'y, 

and the expectation is taken with respect to 

a/2-1 ( n \ "1/2 / „ , ^,.2 \ -(n+2c-2)/2 

for r = 0, 1. Note that in order for vr22(w|y), r = 0, 1, to be proper densities, 
c should be chosen such that c < 6/2. Hence, under the mixture prior (4.4) 
and squared error loss, the Bayes estimator for 7 is given by 

(4.15) 7 = /70^(l-p*)7i. 
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Again, using (4.10), the posterior expected squared error loss of 7 can be 
written as 

Var(7|y) = p* VarO(7|?/) + (1 - p*) Var^ {j\y) + 2/(1 - p*){j^ - 7^)(7° - 7')', 
where 

1 



Var'"(7|y) 



n + 2c - 4 
1 



n I r\2 



E 

U=l 



V + di 



y 



n + 2c - 4 

+ E'[Y{v)nv)'\yl 



E 

L \j=l 



Ml: 



r = 0, 1. Since these expectations involve only one-dimensional integrals, 
they can be computed easily using one of the several standard techniques, 
such as Gauss quadrature, Monte Carlo or Laplace approximation. 
Finally, the function estimator/ of / is 

f{x)=p*P{x) + {l-p*)f\x), 



where 
(4.16) 
and 
(4.17) 



K 



~f{x) = Y.{l'Mx))l 

k=0 



K 



f\x) = Y.{^\M^))l 

k=0 



for X G M. Note that equation (4.16) corresponds to the Bayes estimator 
of / if one believes that the G'^-invariance assumption is true, that is, if 
p=l, and (4.17) is the Bayes estimator of / under the non-G'^-invariance 
assumption. 



4.5. Bayes factor and choice of K. We now describe how the optimal 
level of truncation K is to be determined. As indicated above in (3.4), the 
choice of K provides a model for the observations through the choice of the 
corresponding regression function. Denote the maximum truncation level by 
Kmax, so that Ef=r['i™'^fc + dim£:^] < n (cf. Section 3). 

Let denote the model arising from (3.4), (3.5) corresponding to the 
truncation level K. Our task is to pick the best model for the given data 
from the set of models 



TIk, K = 1,2,...,K, 
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The well-accepted method (cf. [37], Section 7.2.2) for deciding between 
two possible models is to compute their associated Bayes factor. As a basis 
of comparison, the larger model OJtxmax '^ill be used. Hence, we have to 
compute 

m{y\MK) 



(4.18) «B 



K ■ 



where m(y|9Jt/^) denotes the marginal density of y under the model dJlx, 
K = 1,. . . ,-fCmax- (Note that 53A'n,ax = From (4.7) it follows that, under 

y\a^T'^pN{0,a'ln + T\'^Kr%^'K + Qn,x(p))) 

+ {l-p)N{0, a^In + t\^kV\:^'k + QnAp))), 

where we have shown the dependence of F", and on K explicitly 
with subscripts. It follows then that 

m{y\mK) = I m{y\TtK,a'',T'')d7r{a\T''), 

where 7r((T^,r^) is the joint prior distribution on and r^. 

As in the previous section, consider the spectral decomposition of 
^kT'k^'k + Qn,K(p)^ for r = 0, 1. Let and i/^ be such that '^k^'k^'k + 
Qn xip) — ^k-^k^k' for r = 0, 1. Also, let ^ be the ith diagonal element 
of Dj^ and let w^^ = Hj^y = (u;^ i, • • • , for r = 0, 1. Then, using (4.14), 

the marginal density of y under OJt/^ can be expressed as 

m{y\mk) =pj m\y\mk,a\T^)dTr{a\T^) 

+ {l-p)j m\y\Tlk,a\T'')d7:{a'',T^), 



where 

rri{y\Tlk) 



..a/2-1 



(6 + au)('^+^)/2 



1/2 / n ^„r \2 \ -(n+2c-2)/2 



W 



\i=l 



V + d\ 



for r = 0, 1. Consequently, to choose the best model DJlx, one needs to com- 
pute m(y|9Jli^) for = 1, . . . , -fCmax- Then the best value of K (equivalently, 
the best model is the one for which *Bj<- is maximum. 

An alternative to the Bayes factor is to use Schwarz's criterion (cf. [38] 
and [37], Section 7.2.3) which can be viewed as an approximation to the 
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logarithm of the Bayes factor. Since C 97^2 C ••• C 9?t_Kniax, Schwarz's 
criterion can be written as 

(4.19) ei, = -log£„ + ^^^y^log(n), 

where denotes the ratio of the hkehhood functions under dJli and dJlj 
evaluated at the maximum likelihood estimator of 7 under both models and 
VTj (tTj) corresponds to the number of parameters in model 9Jtj The 
model Tli is preferred to the model Tlj if 6ij > 0. 

5. Frequentist properties of the hierarchical Bayes estimator. We may 

wish to model the nuisance parameter directly as in [1, 2]. One of the ad- 
vantages of this approach is that it allows one to make a direct comparison 
with the smoothing spline approach of Section 3.1. Although we can ap- 
proach this by imposing G*^-invariance priors as done previously, in order to 
ease the notation we will not impose G'^-invariance conditions, or, equiva- 
lently, as stated in Remark 4.1, we will assume invariance with respect to 
the trivial subgroup. 

5.1. Modeling the nuisance parameter directly. We begin by rewriting 
(3.6) as 

y = TV' + e, 

where T = [$,/], ip = {'y',r]'y. Following Angers and Delampady [1], we as- 
sume a multivariate normal prior 

where = {Q^{xi,Xj)) for z,j = l,...,n as defined in (4.5) with the r 
suppressed to save notation. 

By imposing a second stage prior as in Section 4.3 (see also [1]) on t/'O) a 
hierarchical Bayes estimator of can be written as 

(5.1) iP^y, = ET'HE[{vIn + Dy^]H'y, 

where H and D are such that THT' = HDH' , H is an orthogonal matrix 
and D is a diagonal matrix. 

Hence, using Corollary 2 of [1], a hierarchical Bayes estimator of / is 

(5.2) /hb(x) = (0(x)',g(x)')^hb, 



r 

Ql 



where (/'(x) and q{x) are defined in Section 3.1, x £ M. 
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5.2. Hierarchical Bayes estimator as a shrinkage estimator. Let ^/^is be 
the least squares estimator, that is, the solution of the normal equation 
T'Tip = T'y. We have the following, which shows that the hierarchical Bayes 
estimator (5.1) is a shrinkage estimator of the least squares solution. 

Lemma 5.1. 

V'hb = (/ - EKi;H-i + T'T)-i]Ms. 
Substituting Lemma 5.1 into (5.2), we have 

hu^) = {cp{xy,qixy){i-E[vivE-' + rTy^])i^,„ 

for x£M. 

5.3. Splines as limits of hierarchical Bayes estimators. In this section let 
us compare the hierarchical Bayes estimator (5.2) with the spline estimator 
of Theorem 3.1. Let us begin by writing 

iphh{v) = (dhb,Chb), 

where dhb = Ty^'^>'AQ^,, + <^r^'r)-^y, Chb = + $r$^)~'y, = $rV2 
and Qnv = vl + Q^. Thus, we can rewrite (5.2) as 

fbbix) = (pixYdhh + qixYchh 

for xeM. 

The comparison with the spline estimator of Theorem 3.1 comes from 
setting V = nS, and diffusing the parameter T. We use the notation || • ||op to 
denote the usual operator norm. We have the following result. 

Theorem 5.2. Suppose xi, . . . ,Xn G M locally satisfies the Cox assump- 
tions. Ifv = ni, ^ - n-2^/(2s+dimM) ||r||-pi^o, then 

i7;||/j^j^_/||2«„-2s/(2s+dimM) 

asn^oo, where f G Hs{M,M), M > and s > (dimM)(5 diniM - 2)/4. 

6. Application to long period cometary orbits. Let us illustrate the pro- 
cedure in the case of the 2-sphere, S'^. The data considered in this applica- 
tion consist of directed unit normals of the 658 single-apparition long period 
cometary orbits found in the catalogue of [33]. The object of interest is the 
distribution of the directed normals on 5^. 

This is a well-known directional data set and has been previously ana- 
lyzed in various ways by Jupp and Spurr [23], Watson [47], Fisher, Lewis 
and Embleton [15], Wiegert and Tremaine [49] and Mardia and Jupp [32]. 
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Recently, a thorough data analysis is performed in [22], where the reader can 
obtain more background about this data set. The main conclusions reached 
in [22] are the following: the data is S'0(2)-invariant around the North Pole, 
(0,0,1)', or, longitudinally invariant; and, that the data is observed with 
considerable selection bias whose direct impact is the rejection of spherical 
uniformity. Indeed, astronomers believe that the intrinsic distribution of the 
unit directed normals of the long period cometary orbit is spherically uni- 
form and until the Jupp, Kim, Koo and Wiegert [22] paper, it was never 
really understood why standard directional statistical tests rejected spheri- 
cal uniformity. It was found that the data has considerable selection bias and 
when that selection bias is accounted for, Jupp, Kim, Koo and Wiegert [22] 
show that one can no longer statistically reject null spherical uniformity. In 
addition to the above findings, with the techniques developed in this paper, 
we are now able to estimate a longitudinally invariant adaptive estimator of 
the density of the unit normal vectors to the cometary orbits, that is, the 
probability density of the observed data with selection bias. 

We will do so by using histosplines on 5^ , since these allow one to calculate 
densities as a regression problem; see [13]. This is done in the following way. 
Let 



7r(ii - 1) vrjA 
m ^ m J 



X 



27r(j2-l) 27rj2 



m m 



for ji,j2 = l,...,m. This partitions 5^ (with the exception of the South 
Pole) using (2.3). Thus, we seek the solution to u£ Hs{S'^) that minimizes 

mm „ 

(6.1) Yl Y^iynn- Ln^.^f +q 

jl = lj2=l 



where yjija the relative frequency of the data in Sj-^^j^ and 
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for ji, j2 = 1, ■ ■ ■ ,m. Other than some very minor modifications, the theory 
would go through exactly as presented in the paper. This is line with what 
may be called the general spline smoothing problem; see [46], page 10. 
The solution to the above general minimization problem (6.1) is 

u^{x) = (j){xyd + q{x)'c^ 



where 

4* = (^ilj2>fcg), 
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oo k 

Q'{uJl,UJ2)= ''k Y ykq{^l)ykq{^2), 

k=K+l q=-k 

h,i2,ji,j2 = 1, ■ ■ ■ ,m, \q\ < k, k = 0,1, . . . , K, x,uji,u!2 G S'^. Furthermore, 
= [(A; + l/2){k + l){k + 2){k + 3)]^/2 replaces = k{k + 1), which are 
asymptoticaUy equivalent as k^oo. This allows us to compute Q'' (0^1,102) 
in closed form, see (4.11) and (4.12), and is the usual way one approaches 
splines on S"^ (cf. [45] and [46], Chapter 2 and [31]). 

The above represents the spline solution without any adjustment for 
S'0(2)-invariance. If we want to invoke S'0(2)-invariance together with the 
hierarchical Bayesian structure, we would need to use the previous 5*0(2)- 
invariance formulation of Examples 4.2 and 4.4. Thus, all of our parameters 
are established and, therefore, we can employ the 5'<9(2)-invariant adaptive 
hierarchical Bayes estimator (4.15). 

6.1. Numerical results. We compute the Bayes estimator, (4.15), for the 
comet data for several values of p S (0, 1), ii' = 1, 2, . . . , 10 and m = 25. The 
Bayes factor (4.18) and Schwarz's criterion (4.19) are given in Table 1 along 
with the "best" value of p that maximizes the Bayes factor for each value of 
K. We notice that the Bayes factor is highest at K = 6, while Schwarz's 
criterion &k is highest at K = 4:. The two models are very similar and 
below we compute the Bayes estimator with K = 6 and p = 0.995. The 
update value, (4.9), is p* = 1 — 1 x 10"^*^, consequently it can be assumed 
that p* = 1, that is, a posteriori the Bayes model puts all its weight on the 
5'0(2)-invariance model and therefore has adapted to the invariance. An 
explanation of S'0(2)-invariance for the comet data is given in [22]. 

In Figure 1 we provide perspective and contour plots of the hierarchical 
Bayes estimator (with p* = 1 and K = 6). The top two panels are the per- 
spective plots, while the bottom two panels are the contour plots. In both 
sets of plots the domain is taken to be the equal area projection of S"^ viewed 
from the "North Pole" (left panels) and the "South Pole" (right panels). In 
particular, for each point {(p,'&) € S'^, where f G [0,27r) and G [0,7r), Lam- 
bert's equal area projection is defined by 



{x,y) = 2sin — (cos(/9,sin(^). 
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By varying t9 G [0, vr), the 2-sphere is mapped onto a disk of radius 2 with 
the origin being the "North Pole." The northern hemisphere is thus mapped 
onto the disk of radius \/2, while the southern hemisphere is mapped onto 
an annulus of inner radius \f2 and outer radius 2. In Figure 1 the left panels 
take the domain where i9 G [0,7r/2], so that we are viewing the northern 
hemisphere of the 2-sphere. We can also reverse the figures so that the South 
Pole is at the origin with the roles of the southern and northern hemispheres 
reversed by using 

= 2 sin ^ (cos(/9,sin(/?). 

Indeed, the right panels of Figure 1 are such that one is viewing the "South 
Pole" at the origin along with just the southern hemisphere. We note that 
both of these projections preserve area in the sense that dxdy = sin t!) dip cItD. 
In [22] a thorough discussion is given with regard to the equal area projection 
for the comet data. 

As a comparison, in Figure 2 we also plot in a similar way the spline 
solution where the value of has been chosen by cross-validation; see Remark 
3.2. 

From Figure 1 one can see that the hierarchical Bayes estimator (4.15) 
adapts to the S'0(2)-invariance. In fact, since for K = 6, p* ^ 1, we have that 
^ ~ ^0 given by (4.13). Furthermore, the highest concentration of the data 
is at the North and South Poles which is indicative of the cometary orbits 
having their orbital planes near the ecliptic plane. The 5'0(2)-invariance 
is represented by the circular contours. This then represents the estimated 
probability density of the unit normal vectors of the cometary orbits in 
the presence of selection bias as explained in [22]. We note that the spline 
method (see Figure 2) without any adjustment is picking up the peaks at 
the North and South Poles; however, S'0(2)-invariance is not particularly 
distinguishable since the contours do not appear very circular. 







Table 1 








Choice of K 




K 


P 






1 


0.8 


0.0000 


170.8492 


2 


0.9 


0.0000 


232.3569 


3 


0.8 


0.0000 


239.0247 


4 


0.995 


49.4024 


324.4495 


5 


0.995 


6.0496 


278.4142 


6 


0.995 


90.0171 


234.6898 


7 


0.995 


7.3810 


156.4761 


8 


0.995 


3.3201 


143.0733 


9 


0.995 


3.0042 


75.3813 



7. Proofs. 

this paper. 
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In this section we provide the proofs to all of the results in 



7.1. Proof of Theorem 3.1. The proof to Theorem 3.1 essentially follows 
from exhibiting linear independence. For this we need the following result. 

Lemma 7.1. On a compact connected Riemannian manifold M let 
Q{xi,x) = (j){xi)' (l){x) + Q^{xi,x), 




0.004 • 



0.002- 




-2 -2 



(c) (d) 




-1 -0.S 0.5 1 -1 -0.5 0.5 1 

Fig. 1. Perspective and contour plots of the hierarchical Bayes estimator. The left panels, 
(a), (c), have the "North Pole" as the origin with just the northern hemisphere as the 
domain, which is a disk of radius \/2, while the right panels, (b), (d), have the "South 
Pole" as the origin with just the southern hemisphere as the domain, which is a disk of 
radius \/2. 
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where xi, . . . , 3;„ G M are distinct and x G M. Then {Q{xi,x) : i = 1, . . . , n} 
is a linearly independent set in Hs{Wl) for s > dimM. 

Proof. We first need to regularize the problem as in Lemma 2.3 in [41]. 
For p G M, let {Op,'ipp) be a chart; see the Appendix. Now define 

^ f exp{-p(p, x)/[e - p{p, x)]}, if p{p, x) < e, {p{p, x) < e} C Op, 
\ 0, otherwise, 



m (6) 




-1 -0.S 0.6 1 -1 -0.5 0.5 1 



Fig. 2. Perspective and contour plots of the spline estimator. The left panels, (a), (c), 
have the "North Pole" as the origin with just the northern hemisphere as the domain, 
which is a disk of radius \/2, while the right panels, (b), (d), have the "South Pole" as the 
origin with just the southern hemisphere as the domain, which is a disk of radius \/2. 
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where p{-,-) is the Riemannian metric; see the Appendix. Notice that we can 
shrink the compact support of f^^p around the compact closure of a smah 
open neighborhood around p G M just as we would do in the Euclidean case. 
This will enable us to regularize data. 

Consider xi, . . . ,x„ distinct points in M and choose e such that 

< e < min p{xi-^,Xi^)/2, 

for 21,^2 = 1, ... ,n. 
Define 

where Ui € C°°{M) and Ui G £^(M) for i = 1, . . . , n and x G M. We note that 
) — ^ij, where 6ij denotes the Kronecker delta for i,j — l,...,n. 
Suppose that 

n 

(7.1) 5]aiQ(xi,-) = 0. 

i=l 

For g,h€ Hs{M) let 

K oo 

k=0 k=K+l 

for s > dim(M)/2 and K >O.We note that 

(7.2) {Q{xj,-),Ui{-)) =Ui{xj) 
for i,j = 1, . . . ,n. By applying (7.2) to (7.1), we get 

n n 

= Yaj{Q{xj, ■),Ui{-)) = YajUi{xj) = Ui 

for i = 1, . . . ,n. Thus, the lemma follows. □ 

Proof of Theorem 3.1. We note that the n x n matrix = (Q^ x 
(xi-f^^Xi^)) is positive definite and invertible. Thus, the n x n matrix Q^^ 
is invertible for all ^ > 0. Now $ has rank k <n and by Lemma 7.1, for 
^ > we can apply Theorem 1.3.1 from [46]. The result is the solution to 
the smoothing problem. □ 



28 



J.-F. ANGERS AND P. T. KIM 



7.2. Proof of Theorem 3.6. The construction of the proof is to locahze 
the arguments over an atlas, do the calculations locally within each chart 
by applying Theorem 6.2 of Cox [11], and piece together the final argument 
by using a partition of unity argument; see the Appendix for the technical 
terms. 

Let O = {(Oajipa) '■ ct £ A} be an atlas and consider V = {6a '■ a E A}, a 
partition of unity subordinate to the open sets of the atlas. For a fixed 5a, 
let Ua equal the number of Xj £ supp^Q,, j = 1, . . . ,n, where the overline in 
this case means closure. Thus, let z", = ^pa{xj^) for i = 1, . . . ,na- 

By the local version of Assumption 3, we can, without loss of general- 
ity, assume that the design S M is a random sample from the 
uniform distribution on M. This means that z", , i = 1, . . . ,na, has distri- 
bution \^^p~^ {z)\ dz , where |f?'i/'Q"^(z)| is the Jacobian of the transformation 
ipa{x) = z. Since we are assuming that assumptions 2, 3 and 4 of [11] are 
locally satisfied, hence, are satisfied on the chart in question, it follows from 
Theorem 6.2 of [11] that 



IE/ \6aif^ - f)ix)\' dx 



= E \Sa{f^-f)o^-\z)\'\di;-\z)\dz 

Now the a ssumption t hat there exist constants < c < \^^p~^{z)\ <C< oo, 
for all z E ipa{supp6a), allows us to assume 

SUp|^„,(t)-^,(i)|xn-VdimM^ 



where the supremum is taken over il)a{su.pp5a), is the empirical dis- 
tribution function of z'j., i = 1, . . . ,na, and '^aidt) = \dipa^{t)\ dt. Thus, as 
stated in [11], page 810, in order to satisfy assumption 2 we need 

s > (dim M) (5 dim M - 2) /4 
in order to be able to choose ^ x ^^2s/(2s+dimM) that, over the chart 
(Oa^ipa), we get the asymptotic minimax rate of j^^2s/(2s+dimM) ^ _^ 
aeA. 

The final argument is to use the partition of unity argument to piece 
together the integration over all of M. Indeed, 



E 



/ \f^{x)-f{x)\'dx=J2E[ \6a{fi-f){x)\'dx 
Jm Jo^ 

\5aifi-f)oij-\z)\'\di^-\z)\dz 
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-2s/{2s+dimM) 
"a 

a&A 

/ \ -2s/{2s+dimM) 

« E^H 

\a£A / 
< j^-2s/(2s+dimM) 

where we are using the fact that A can be taken to be finite, because M is 
compact and by the fact that X^ae^^a > n. 

7.3. Proof of Theorem 3.7. The proof is argued along the hues used in 
[46], Chapter 1. From the Gaussian assumption, we note that 

¥.Mx)y = VT^4>{x)+T''q{x), 

where 

q{x) = {Q^{xi,x),...,Q^{xn,x)y, 

X S M. Furthermore, 

Eyy' = z^T^^^' + T^Q^ + a^In- 
Setting ^ = a'^ /nr'^, we have 
E{Mx)\yi, . . . ,y„) = <A(x)V«I>'(i.ci>(|.' + Ql^y^y + q{xy{u<^<^' + Q^^T^y 

for j; G M. Note that 
and 

as V ^ oo. 

7.4. Proof of Lemma A.S. Now 

oo 

Cov(r?^^,r?^J =pV2 E iMxh),^i^(pkixi2))l 
k=K+l 



k=K+l 
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for j = 0, 1. Since O],'' = diag(z^;^^) and i^l^^ < X).^, £=!,..., dim<5[, k = K + 
1, . . . , j, r = 0, 1, we have that 



\'{(t>k{Xi^)Ak{Xi^))k 

k=K+l 



\k=K+l 



1/2 



hiXi^)\\k 



\k=K+l 



1/2 



kiXi2)\\k 



< T'^^JZ{Xi^,s)Z{Xi2,s), 



where Z{x,s) = J2k>o^k^\\^k{x)\\1 is the zeta function of A. It is known 
that Z{x,s) is a continuous function of x for fixed s > dim(M)/2 [34]. Since 
M is compact, there exists a constant C(M, s) < oo depending only on M 
and s such that sup^gMZ(x,s) < C(M, s). Hence, {Qn)i,j ^ C(M, s) for ah 
i,j = 1,2,. ..,n. 

7.5. Proof of Lemma 5.1. Using a standard matrix identity (cf. [39], 
page 151) and omitting the expectation to ease notation, we can write (5.1) 

as 



hb 



irH[{vIn + D)-^]H'y 



iT'{vHInH' + HDH 
IT'ivIn + TET'y^y 
v-^E]T{I + T[v-^E]T')-^y 

^;H-i + T'T)~i(T'T)'i/'is 
yE"^ + T'Ty\[vE^^ + T'T] - uH-^)V'is 
I-v{vE-^ + rTy')i;i,. 

7.6. Proof of Theorem 5.2. Again, we wih omit the expectation to ease 
notation. Now 

^PM=ErH{vIn + D)-^H'y 

= ET'ivHInH' + HDH'Y^y 
= ET'ivIn + TET'y^y 
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y 



Qn 

where = vl + and $r = $r^/^. Hence, the first k components of 
'4'hh{v) are given by 

= ri/2ci>'r(Q^_, + $r<J>r)"'y, 

and the last n ones are 

Qn,.(Qn,. + ^r^'r)~'y. 

Using Angers and Delampady [1], the hierarchical Bayesian equivalent of d 
of Theorem 3.1 is then 

= (r-i + a>'(Q^,j-i$)-ici>'(Q^j-iy 

if IIFIIqp^ ^ and v = n^. Similarly, the hierarchical Bayesian equivalent of 
c of Theorem 3.1 is given by 

= iQll,.)-'[i - '^r'^'AQt, + <^r^'r)-'QlMt)-']y 

= iQn,vr'[i - Hr-' + <^'iQ^,,r'^r'<^>'iQ^,,r']y 

c 
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if ||r||~p^ and v = n^. 

Consequently, the spline estimator of Theorem 3.1 is a limiting case of 
the hierarchical Bayes estimator (5.2) 

(7.3) /j^j,(;^)^/-(;^), 

for X G M, when v = n^ as ||r||op^ 0. Now we know that 

^ll/hb - /II < i^ll/hb - fill + E\\f^ - f\\. 

By (7.3), we know that £;||/hb - f^\\ ^ as ||r||-pi ^ for ^; = Thus, the 
result follows. 

APPENDIX 

In this appendix we gather together some further technical discussions 
that concern M and that are directly or indirectly used in the paper. Con- 
sider a smooth curve 7 : [a, b] M, with a <b, and let j'{t) denote its first 
derivative for some t S (a, 6). Then the length of 7 is defined through the 
Riemannian structure as 



Since we are assuming that M is connected, hence, for any two points p,q £ 
M, we can find a curve in M that joins them in M, we can define a metric 
on M by 

p{p, q) = inf{;(7) : 7 joining p and q}, 

p,q £ M. This metric is called the Riemannian distance or metric which 
makes (M.,p) a metric space ([7], page 39). 

For p G M, let (Oa, V'a) be a chart, that is, Oa C M is an open set with 
peOa and ipa-Oa^ tpa{Oa) C RdimM jg ^ diffcomorphism. A cohection of 
charts {(O^, V'a) : a G -4} is called an atlas if IJ^ Oa = M. Since M is assumed 
to be compact, we can take ^ to be a finite set. Thus, the open sets of an 
atlas form a finite open cover of M. Consider a collection V of nonnegative 
functions 6a whose support is contained in Oa and that has the property 
that 1 = ^a- We call such a collection a partition of unity subordinate to 
the open cover, and for some integrable functions /, ^:M— >R, integration 
is defined by 

(A.l) / f{x)g{x)dx=Y, j foi;~\z){5ag)o^^Hz)dz. 

We note that this definition is well defined, hence, is independent of the 
choice of open cover ([7], page 6). 
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A famous formula due to Hermann Weyl states 
lim —, ^ -, = 



(E,=odim£:,)2/dimr 

where 

(A.2) W 



(2^)dimMr(i + dimM/2) 

([7], page 9). We note that the W appearing above is the same quantity 
which appears in the minimax constant of Theorem 3.4. 
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